Overview

Dataset statistics

Number of variables14
Number of observations300190
Missing cells49201
Missing cells (%)1.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory32.1 MiB
Average record size in memory112.0 B

Variable types

Categorical4
DateTime1
Numeric9

Alerts

VERSIE has constant value "1.0" Constant
DATUM_BESTAND has constant value "2022-06-21" Constant
PEILDATUM has constant value "2022-06-01" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1818 distinct values High cardinality
BEHANDELEND_SPECIALISME_CD is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with BEHANDELEND_SPECIALISME_CD and 1 other fieldsHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
DATUM_BESTAND is highly correlated with PEILDATUM and 1 other fieldsHigh correlation
PEILDATUM is highly correlated with DATUM_BESTAND and 1 other fieldsHigh correlation
VERSIE is highly correlated with DATUM_BESTAND and 1 other fieldsHigh correlation
JAAR is highly correlated with AANTAL_PAT_PER_SPC and 1 other fieldsHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with JAAR and 1 other fieldsHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with JAAR and 1 other fieldsHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 49201 (16.4%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 21.16688519) Skewed

Reproduction

Analysis started2022-07-08 10:11:57.220146
Analysis finished2022-07-08 10:12:20.475652
Duration23.26 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

VERSIE
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
1.0
300190 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters900570
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0300190
100.0%

Length

2022-07-08T10:12:20.523649image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-08T10:12:20.749439image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0300190
100.0%

Most occurring characters

ValueCountFrequency (%)
1300190
33.3%
.300190
33.3%
0300190
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number600380
66.7%
Other Punctuation300190
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1300190
50.0%
0300190
50.0%
Other Punctuation
ValueCountFrequency (%)
.300190
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common900570
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1300190
33.3%
.300190
33.3%
0300190
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII900570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1300190
33.3%
.300190
33.3%
0300190
33.3%

DATUM_BESTAND
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
2022-06-21
300190 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3001900
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-06-21
2nd row2022-06-21
3rd row2022-06-21
4th row2022-06-21
5th row2022-06-21

Common Values

ValueCountFrequency (%)
2022-06-21300190
100.0%

Length

2022-07-08T10:12:20.826386image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-08T10:12:20.921056image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2022-06-21300190
100.0%

Most occurring characters

ValueCountFrequency (%)
21200760
40.0%
0600380
20.0%
-600380
20.0%
6300190
 
10.0%
1300190
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2401520
80.0%
Dash Punctuation600380
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21200760
50.0%
0600380
25.0%
6300190
 
12.5%
1300190
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
-600380
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3001900
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21200760
40.0%
0600380
20.0%
-600380
20.0%
6300190
 
10.0%
1300190
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3001900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21200760
40.0%
0600380
20.0%
-600380
20.0%
6300190
 
10.0%
1300190
 
10.0%

PEILDATUM
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
2022-06-01
300190 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3001900
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-06-01
2nd row2022-06-01
3rd row2022-06-01
4th row2022-06-01
5th row2022-06-01

Common Values

ValueCountFrequency (%)
2022-06-01300190
100.0%

Length

2022-07-08T10:12:20.999480image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-08T10:12:21.093049image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2022-06-01300190
100.0%

Most occurring characters

ValueCountFrequency (%)
2900570
30.0%
0900570
30.0%
-600380
20.0%
6300190
 
10.0%
1300190
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2401520
80.0%
Dash Punctuation600380
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2900570
37.5%
0900570
37.5%
6300190
 
12.5%
1300190
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
-600380
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3001900
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2900570
30.0%
0900570
30.0%
-600380
20.0%
6300190
 
10.0%
1300190
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3001900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2900570
30.0%
0900570
30.0%
-600380
20.0%
6300190
 
10.0%
1300190
 
10.0%

JAAR
Date

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
Minimum2012-01-01 00:00:00
Maximum2022-01-01 00:00:00
2022-07-08T10:12:21.164430image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:21.247699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean430.6566208
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:21.358299image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation958.8150454
Coefficient of variation (CV)2.226402658
Kurtosis65.28052091
Mean430.6566208
Median Absolute Deviation (MAD)8
Skewness8.196516408
Sum129278811
Variance919326.2914
MonotonicityNot monotonic
2022-07-08T10:12:21.465252image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
30542433
14.1%
31338971
13.0%
30334672
11.6%
33023836
 
7.9%
31620357
 
6.8%
30815859
 
5.3%
30612562
 
4.2%
32412270
 
4.1%
30112187
 
4.1%
3049758
 
3.3%
Other values (18)77285
25.7%
ValueCountFrequency (%)
30112187
 
4.1%
3026609
 
2.2%
30334672
11.6%
3049758
 
3.3%
30542433
14.1%
30612562
 
4.2%
3075262
 
1.8%
30815859
 
5.3%
3103380
 
1.1%
31338971
13.0%
ValueCountFrequency (%)
84184133
 
1.4%
8416123
 
< 0.1%
1900193
 
0.1%
390824
 
0.3%
3893208
 
1.1%
3624143
 
1.4%
3612152
 
0.7%
3353033
 
1.0%
33023836
7.9%
329796
 
0.3%

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct1818
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
101
 
1283
402
 
1248
403
 
1208
301
 
1207
201
 
1145
Other values (1813)
294099 

Length

Max length4
Median length3
Mean length3.349488657
Min length2

Characters and Unicode

Total characters1005483
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique29 ?
Unique (%)< 0.1%

Sample

1st row07
2nd row15
3rd row20
4th row12
5th row20

Common Values

ValueCountFrequency (%)
1011283
 
0.4%
4021248
 
0.4%
4031208
 
0.4%
3011207
 
0.4%
2011145
 
0.4%
2031141
 
0.4%
4011020
 
0.3%
4041009
 
0.3%
409982
 
0.3%
802972
 
0.3%
Other values (1808)288975
96.3%

Length

2022-07-08T10:12:21.586630image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1011283
 
0.4%
4021248
 
0.4%
4031208
 
0.4%
3011207
 
0.4%
2011145
 
0.4%
2031141
 
0.4%
4011020
 
0.3%
4041009
 
0.3%
409982
 
0.3%
802972
 
0.3%
Other values (1808)288975
96.3%

Most occurring characters

ValueCountFrequency (%)
1192628
19.2%
0184065
18.3%
2133257
13.3%
3109062
10.8%
577373
7.7%
972579
 
7.2%
471568
 
7.1%
759153
 
5.9%
652397
 
5.2%
843259
 
4.3%
Other values (15)10142
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number995341
99.0%
Uppercase Letter10142
 
1.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G1938
19.1%
M1693
16.7%
B1213
12.0%
E851
8.4%
Z828
8.2%
D678
 
6.7%
A651
 
6.4%
F646
 
6.4%
C333
 
3.3%
K329
 
3.2%
Other values (5)982
9.7%
Decimal Number
ValueCountFrequency (%)
1192628
19.4%
0184065
18.5%
2133257
13.4%
3109062
11.0%
577373
7.8%
972579
 
7.3%
471568
 
7.2%
759153
 
5.9%
652397
 
5.3%
843259
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common995341
99.0%
Latin10142
 
1.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G1938
19.1%
M1693
16.7%
B1213
12.0%
E851
8.4%
Z828
8.2%
D678
 
6.7%
A651
 
6.4%
F646
 
6.4%
C333
 
3.3%
K329
 
3.2%
Other values (5)982
9.7%
Common
ValueCountFrequency (%)
1192628
19.4%
0184065
18.5%
2133257
13.4%
3109062
11.0%
577373
7.8%
972579
 
7.3%
471568
 
7.2%
759153
 
5.9%
652397
 
5.3%
843259
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1005483
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1192628
19.2%
0184065
18.3%
2133257
13.3%
3109062
10.8%
577373
7.7%
972579
 
7.2%
471568
 
7.1%
759153
 
5.9%
652397
 
5.2%
843259
 
4.3%
Other values (15)10142
 
1.0%

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct5968
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean439816122.5
Minimum10501002
Maximum998418081
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:21.712938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum10501002
5-th percentile28999037
Q199799034
median149599024
Q3990004004
95-th percentile990516035.5
Maximum998418081
Range987917079
Interquartile range (IQR)890204970

Descriptive statistics

Standard deviation428802968.6
Coefficient of variation (CV)0.9749596403
Kurtosis-1.732758159
Mean439816122.5
Median Absolute Deviation (MAD)119600018
Skewness0.4723035305
Sum1.320284018 × 1014
Variance1.838719859 × 1017
MonotonicityNot monotonic
2022-07-08T10:12:21.845212image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9900040092189
 
0.7%
9900040072150
 
0.7%
9900030042102
 
0.7%
9900040061714
 
0.6%
9903560761555
 
0.5%
9903560731432
 
0.5%
1319992281381
 
0.5%
9900030071358
 
0.5%
1319991641355
 
0.5%
1992990131263
 
0.4%
Other values (5958)283691
94.5%
ValueCountFrequency (%)
105010028
< 0.1%
1050100311
< 0.1%
1050100411
< 0.1%
1050100511
< 0.1%
105010073
 
< 0.1%
1050100811
< 0.1%
1050101011
< 0.1%
105010113
 
< 0.1%
111010029
< 0.1%
1110100311
< 0.1%
ValueCountFrequency (%)
998418081148
< 0.1%
998418080133
< 0.1%
99841807937
 
< 0.1%
9984180778
 
< 0.1%
9984180768
 
< 0.1%
9984180756
 
< 0.1%
998418074204
0.1%
998418073192
0.1%
9984180728
 
< 0.1%
9984180718
 
< 0.1%

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9770
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean514.3071755
Minimum1
Maximum164654
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:21.976817image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median13
Q3101
95-th percentile1738
Maximum164654
Range164653
Interquartile range (IQR)98

Descriptive statistics

Standard deviation3186.890378
Coefficient of variation (CV)6.196472712
Kurtosis396.1231592
Mean514.3071755
Median Absolute Deviation (MAD)12
Skewness16.54114341
Sum154389871
Variance10156270.28
MonotonicityNot monotonic
2022-07-08T10:12:22.100383image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
150459
 
16.8%
224552
 
8.2%
315997
 
5.3%
411675
 
3.9%
59128
 
3.0%
67690
 
2.6%
76449
 
2.1%
85435
 
1.8%
94945
 
1.6%
104364
 
1.5%
Other values (9760)159496
53.1%
ValueCountFrequency (%)
150459
16.8%
224552
8.2%
315997
 
5.3%
411675
 
3.9%
59128
 
3.0%
67690
 
2.6%
76449
 
2.1%
85435
 
1.8%
94945
 
1.6%
104364
 
1.5%
ValueCountFrequency (%)
1646541
< 0.1%
1558841
< 0.1%
1542701
< 0.1%
1515161
< 0.1%
1447251
< 0.1%
1374591
< 0.1%
1180391
< 0.1%
1159411
< 0.1%
1105201
< 0.1%
1096751
< 0.1%

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct10484
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean606.8086812
Minimum1
Maximum239919
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:22.231058image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3110
95-th percentile1981
Maximum239919
Range239918
Interquartile range (IQR)107

Descriptive statistics

Standard deviation4085.892752
Coefficient of variation (CV)6.73341183
Kurtosis712.9122922
Mean606.8086812
Median Absolute Deviation (MAD)13
Skewness21.16688519
Sum182157898
Variance16694519.58
MonotonicityNot monotonic
2022-07-08T10:12:22.361714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
148618
 
16.2%
224142
 
8.0%
315832
 
5.3%
411461
 
3.8%
59058
 
3.0%
67680
 
2.6%
76402
 
2.1%
85388
 
1.8%
94890
 
1.6%
104357
 
1.5%
Other values (10474)162362
54.1%
ValueCountFrequency (%)
148618
16.2%
224142
8.0%
315832
 
5.3%
411461
 
3.8%
59058
 
3.0%
67680
 
2.6%
76402
 
2.1%
85388
 
1.8%
94890
 
1.6%
104357
 
1.5%
ValueCountFrequency (%)
2399191
< 0.1%
2324311
< 0.1%
2321181
< 0.1%
2280481
< 0.1%
2276061
< 0.1%
2268251
< 0.1%
2240991
< 0.1%
2186231
< 0.1%
2142311
< 0.1%
2090561
< 0.1%

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8609
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7716.614947
Minimum1
Maximum227540
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:22.490114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile33
Q1380
median1689
Q36363
95-th percentile37157
Maximum227540
Range227539
Interquartile range (IQR)5983

Descriptive statistics

Standard deviation17946.67857
Coefficient of variation (CV)2.325719074
Kurtosis33.45850561
Mean7716.614947
Median Absolute Deviation (MAD)1554
Skewness5.024953979
Sum2316450641
Variance322083271.7
MonotonicityNot monotonic
2022-07-08T10:12:22.622106image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4592
 
0.2%
9591
 
0.2%
2583
 
0.2%
1550
 
0.2%
3543
 
0.2%
12534
 
0.2%
21532
 
0.2%
14526
 
0.2%
6524
 
0.2%
8519
 
0.2%
Other values (8599)294696
98.2%
ValueCountFrequency (%)
1550
0.2%
2583
0.2%
3543
0.2%
4592
0.2%
5495
0.2%
6524
0.2%
7502
0.2%
8519
0.2%
9591
0.2%
10445
0.1%
ValueCountFrequency (%)
22754023
< 0.1%
21398924
< 0.1%
21375217
< 0.1%
21353825
< 0.1%
21159717
< 0.1%
21043719
< 0.1%
20534917
< 0.1%
20385923
< 0.1%
20060416
< 0.1%
19853020
< 0.1%

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9538
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11088.68809
Minimum1
Maximum368506
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:22.887516image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile42
Q1503
median2355
Q39075.25
95-th percentile52490
Maximum368506
Range368505
Interquartile range (IQR)8572.25

Descriptive statistics

Standard deviation26613.53588
Coefficient of variation (CV)2.400061726
Kurtosis37.01063881
Mean11088.68809
Median Absolute Deviation (MAD)2184
Skewness5.267910555
Sum3328713278
Variance708280292.1
MonotonicityNot monotonic
2022-07-08T10:12:23.011433image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3515
 
0.2%
2500
 
0.2%
4485
 
0.2%
1471
 
0.2%
5453
 
0.2%
6451
 
0.2%
10448
 
0.1%
11418
 
0.1%
23413
 
0.1%
12409
 
0.1%
Other values (9528)295627
98.5%
ValueCountFrequency (%)
1471
0.2%
2500
0.2%
3515
0.2%
4485
0.2%
5453
0.2%
6451
0.2%
7394
0.1%
8399
0.1%
9379
0.1%
10448
0.1%
ValueCountFrequency (%)
36850623
< 0.1%
34852625
< 0.1%
34169519
< 0.1%
33664324
< 0.1%
32379220
< 0.1%
31467217
< 0.1%
31078217
< 0.1%
30258023
< 0.1%
29865117
< 0.1%
28904516
< 0.1%

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct296
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean673983.9904
Minimum13
Maximum1487650
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:23.145127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile26921
Q1277929
median747032
Q31021519
95-th percentile1336157
Maximum1487650
Range1487637
Interquartile range (IQR)743590

Descriptive statistics

Standard deviation419586.1431
Coefficient of variation (CV)0.622546157
Kurtosis-1.11768345
Mean673983.9904
Median Absolute Deviation (MAD)316556
Skewness-0.05201073889
Sum2.023232541 × 1011
Variance1.760525315 × 1011
MonotonicityNot monotonic
2022-07-08T10:12:23.275010image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8809595102
 
1.7%
8741724354
 
1.5%
8439904348
 
1.4%
8943944333
 
1.4%
8805514273
 
1.4%
8949284212
 
1.4%
7545494082
 
1.4%
10840603890
 
1.3%
11010913864
 
1.3%
10635883851
 
1.3%
Other values (286)257881
85.9%
ValueCountFrequency (%)
133
 
< 0.1%
11754
 
< 0.1%
12541
 
< 0.1%
24858
 
< 0.1%
35938
 
< 0.1%
393123
< 0.1%
771159
0.1%
933226
0.1%
1611130
< 0.1%
1623133
< 0.1%
ValueCountFrequency (%)
14876502975
1.0%
14504243048
1.0%
14218173564
1.2%
13451823543
1.2%
13361573439
1.1%
13328503546
1.2%
13173293463
1.2%
12830643577
1.2%
12652551177
 
0.4%
12625521201
 
0.4%

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct296
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1085856.523
Minimum13
Maximum2666840
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:23.413478image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile39189
Q1483127
median1080458
Q31729091
95-th percentile2557598
Maximum2666840
Range2666827
Interquartile range (IQR)1245964

Descriptive statistics

Standard deviation741431.7155
Coefficient of variation (CV)0.6828081796
Kurtosis-0.8569613786
Mean1085856.523
Median Absolute Deviation (MAD)648633
Skewness0.2941317302
Sum3.259632695 × 1011
Variance5.497209888 × 1011
MonotonicityNot monotonic
2022-07-08T10:12:23.544775image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12118135102
 
1.7%
12816174354
 
1.5%
12162944348
 
1.4%
13157164333
 
1.4%
13006264273
 
1.4%
13364904212
 
1.4%
11367734082
 
1.4%
25575983890
 
1.3%
26668403864
 
1.3%
24882713851
 
1.3%
Other values (286)257881
85.9%
ValueCountFrequency (%)
133
 
< 0.1%
11754
 
< 0.1%
12641
 
< 0.1%
24858
 
< 0.1%
36738
 
< 0.1%
397123
< 0.1%
783159
0.1%
1010226
0.1%
178171
 
< 0.1%
1855133
< 0.1%
ValueCountFrequency (%)
26668403864
1.3%
26033803845
1.3%
25785733769
1.3%
25575983890
1.3%
24882713851
1.3%
21841643757
1.3%
21788213635
1.2%
20662283810
1.3%
20450071169
 
0.4%
19903071167
 
0.4%

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct3395
Distinct (%)1.4%
Missing49201
Missing (%)16.4%
Infinite0
Infinite (%)0.0%
Mean3554.785528
Minimum70
Maximum287220
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.3 MiB
2022-07-08T10:12:23.674566image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum70
5-th percentile140
Q1470
median1250
Q34135
95-th percentile13425
Maximum287220
Range287150
Interquartile range (IQR)3665

Descriptive statistics

Standard deviation6543.689664
Coefficient of variation (CV)1.840811383
Kurtosis155.0190446
Mean3554.785528
Median Absolute Deviation (MAD)1020
Skewness7.434401821
Sum892212065
Variance42819874.42
MonotonicityNot monotonic
2022-07-08T10:12:23.800946image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1601845
 
0.6%
1051837
 
0.6%
1101779
 
0.6%
1801485
 
0.5%
1451425
 
0.5%
3001323
 
0.4%
1651258
 
0.4%
1251258
 
0.4%
1851249
 
0.4%
1401233
 
0.4%
Other values (3385)236297
78.7%
(Missing)49201
 
16.4%
ValueCountFrequency (%)
70227
 
0.1%
7575
 
< 0.1%
80362
 
0.1%
85917
0.3%
90677
 
0.2%
95665
 
0.2%
100896
0.3%
1051837
0.6%
1101779
0.6%
115897
0.3%
ValueCountFrequency (%)
2872208
< 0.1%
1489103
 
< 0.1%
1428354
< 0.1%
1221554
< 0.1%
1167653
 
< 0.1%
1097257
< 0.1%
1085707
< 0.1%
1076554
< 0.1%
1012708
< 0.1%
954657
< 0.1%

Interactions

2022-07-08T10:12:17.515720image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:05.454811image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:06.945806image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:08.546766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:09.969497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:11.443295image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:12.980919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:14.468739image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:16.078623image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:17.679191image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:05.632476image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:07.113474image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:08.712587image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:10.135168image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:11.607569image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:13.154123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:14.640205image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:16.245830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:17.832204image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:05.794567image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:07.273348image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:08.867100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:10.294641image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:11.761420image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:13.315495image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:14.801054image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:16.403306image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:17.986981image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:05.957358image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:07.436412image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:09.024121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:10.449892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:12.044989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:13.478917image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:14.963834image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:16.562060image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:18.142634image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:06.121570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:07.595235image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:09.180962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:10.608035image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:12.198642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:13.642251image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:15.126223image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:16.720862image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:18.290642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:06.278546image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:07.749791image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:09.332769image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:10.758463image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:12.348414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:13.799242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:15.283355image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:16.875558image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:18.451182image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:06.447770image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:08.066723image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:09.495698image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:10.919918image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:12.510896image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:13.971960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:15.450846image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:17.040068image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:18.612286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:06.616325image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:08.232451image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:09.659267image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:11.081547image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:12.672544image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:14.146266image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:15.620538image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:17.206050image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:18.765449image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:06.775016image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:08.390148image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:09.811551image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:11.282399image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:12.824497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:14.303996image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:15.779749image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-08T10:12:17.362684image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-07-08T10:12:23.920456image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-08T10:12:24.100357image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-08T10:12:24.279329image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-08T10:12:24.447112image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-08T10:12:24.557280image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-08T10:12:19.062161image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-08T10:12:19.605859image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-08T10:12:20.262041image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02022-06-212022-06-012018-01-01329079900290102272341439150021988241771345.0
11.02022-06-212022-06-012018-01-0132915990029002149150102910752198824177205.0
21.02022-06-212022-06-012018-01-013292099002901122442198824177545.0
31.02022-06-212022-06-012018-01-0132912990029010141611412021988241771345.0
41.02022-06-212022-06-012018-01-0132920990029010224421988241771345.0
51.02022-06-212022-06-012018-01-0132901990029010535433134821988241771345.0
61.02022-06-212022-06-012018-01-01329039900290113753798478632198824177545.0
71.02022-06-212022-06-012018-01-0132917990029010737369570421988241771345.0
81.02022-06-212022-06-012018-01-01329059900290102562611284134521988241771345.0
91.02022-06-212022-06-012018-01-013290299002901113751386487249772198824177545.0

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
3001801.02022-06-212022-06-012014-01-013137799790030092377712391038891206622822695.0
3001811.02022-06-212022-06-012016-01-01303238199299064111255913887133285018337116200.0
3001821.02022-06-212022-06-012016-01-0130382517979901533121713328501833711135.0
3001831.02022-06-212022-06-012016-01-013034029728020981161778116133285018337113935.0
3001841.02022-06-212022-06-012015-01-01322130397280209311257397285144220775996911610.0
3001851.02022-06-212022-06-012016-01-013032691992990611145415249133285018337112815.0
3001861.02022-06-212022-06-012014-01-0131384229499056332105651010388912066228NaN
3001871.02022-06-212022-06-012016-01-013032511992990901165836750133285018337112655.0
3001881.02022-06-212022-06-012012-01-01303359201170201160177168148765019395951330.0
3001891.02022-06-212022-06-012022-01-0130601414959902511131332983419NaN